智能论文笔记

BRACS: A Dataset for BReAst Carcinoma Subtyping in H&E Histology Images

Nadia Brancati , Anna Maria Anniciello , Pushpak Pati , Daniel Riccio , Giosuè Scognamiglio , Guillaume Jaume , Giuseppe De Pietro , Maurizio Di Bonito , Antonio Foncubierta , Gerardo Botti

分类：人工智能 | 计算机视觉

2021-11-08

乳腺癌是最常见的癌症，并寄存癌症的妇女的最多死亡人数。结合大规模筛查政策的诊断活动的最新进展显着降低了乳腺癌患者的死亡率。然而，病理学家手动检查病理学家的载玻片是麻烦的，耗时的，并且受到显着的和观察者内的变异性。最近，全幻灯片扫描系统的出现授权了病理幻灯片的快速数字化，并启用了开发数字工作流程。这些进步进一步使利用人工智能（AI）来协助，自动化和增强病理诊断。但是AI技术，尤其是深度学习（DL），需要大量的高质量注释数据来学习。构建此类任务特定的数据集造成了几个挑战，例如数据获取级别约束，耗时和昂贵的注释，以及私人信息的匿名化。在本文中，我们介绍了乳腺癌亚型（BRACS）DataSet，一个大队列的注释血清杂环蛋白和eosin（H＆E） - 染色的图像，以促进乳房病变的表征。 BRACS包含547个全幻灯片图像（WSIS），并从WSI中提取4539个兴趣区域（ROI）。每个WSI和各自的ROI都是通过三个董事会认证的病理学家的共识注释为不同的病变类别。具体而言，Bracs包括三种病变类型，即良性，恶性和非典型，其进一步亚级分为七个类别。据我们所知，这是WSI和ROI水平的最大的乳腺癌亚型的附带数据集。此外，通过包括被升值的非典型病变，Bracs提供了利用AI更好地理解其特征的独特机会。

translated by 谷歌翻译

The RPM3D project: 3D Kinematics for Remote Patient Monitoring

Alicia Fornés , Asma Bensalah , Cristina Carmona-Duarte , Jialuo Chen , Miguel A. Ferrer , Andreas Fischer , Josep Lladós , Cristina Martín , Eloy Opisso , Réjean Plamondon

分类：人工智能

2022-12-09

This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute5 (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases.

translated by 谷歌翻译

Fairness and bias correction in machine learning for depression prediction: results from four different study populations

Vien Ngoc Dang , Anna Cascarano , Rosa H. Mulder , Charlotte Cecil , Maria A. Zuluaga , Jerónimo Hernández-González , Karim Lekadir

分类：机器学习

2022-11-10

A significant level of stigma and inequality exists in mental healthcare, especially in under-served populations, which spreads through collected data. When not properly accounted for, machine learning (ML) models learned from data can reinforce the structural biases already present in society. Here, we present a systematic study of bias in ML models designed to predict depression in four different case studies covering different countries and populations. We find that standard ML approaches show regularly biased behaviors. However, we show that standard mitigation techniques, and our own post-hoc method, can be effective in reducing the level of unfair bias. We provide practical recommendations to develop ML models for depression risk prediction with increased fairness and trust in the real world. No single best ML model for depression prediction provides equality of outcomes. This emphasizes the importance of analyzing fairness during model selection and transparent reporting about the impact of debiasing interventions.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Accelerated and Quantitative 3D Semisolid MT/CEST Imaging using a Generative Adversarial Network (GAN-CEST)

Jonah Weigand-Whittier , Maria Sedykh , Kai Herz , Jaume Coll-Font , Anna N. Foster , Elizabeth R. Gerstner , Christopher Nguyen , Moritz Zaiss , Christian T. Farrar , Or Perlman

分类：机器学习

2022-07-22

目的：大大缩短定量3D化学交换饱和转移（CEST）和半固体磁化转移（MT）成像所需的采集时间，并允许快速化学交换参数图重建。方法：三维CEST和MT磁共振指纹（MRF）数据集的L-精氨酸幻象，全脑，全脑和小腿肌肉的健康志愿者，癌症患者和心脏病患者是使用3T临床扫描仪在3T不同的位点使用3T临床扫描仪获得的3种不同的扫描仪模型和线圈。然后，设计和训练了一个生成的对抗网络监督框架（GAN-CEST），以学习从减少的输入数据空间到定量交换参数空间的映射，同时保留感知和定量内容。结果：GAN-CEST 3D采集时间为42-52秒，比CEST-MRF短70％。整个大脑的定量重建需要0.8秒。在地面真相和基于GAN的L-精氨酸浓度和pH值之间观察到了极好的一致性（Pearson的R> 0.97，NRMSE <1.5％）。来自脑肿瘤受试者的gan-cest图像产生的半固体量分数和汇率NRMSE为3.8 $ \ pm $ 1.3％和4.6 $ \ pm $ 1.3％，SSIM和96.3 $ \ pm $ \ pm $ 1.6％和95.0 $ \ pm $ 2.4％。半固体交换参数的NRMSE <7％和SSIM> 94％的小腿肌肉交换参数的映射。与MRF相比，在具有较大敏感性伪像的区域中，Gan-Cest表现出改善的性能和噪声降低。结论：Gan-Cest可以大大减少定量半固体MT/CEST映射的获取时间，同时即使在训练过程中无法使用的病理和扫描仪模型时，也可以保持性能。

translated by 谷歌翻译

Sockeye 3: Fast Neural Machine Translation with PyTorch

Felix Hieber , Michael Denkowski , Tobias Domhan , Barbara Darques Barros , Celina Dong Ye , Xing Niu , Cuong Hoang , Ke Tran , Benjamin Hsu , Maria Nadejde

分类：自然语言处理

2022-07-12

Sockeye 3是神经机器翻译（NMT）的Mockeye工具包的最新版本。现在，基于Pytorch，Sockeye 3提供了更快的模型实现和更高级的功能，并具有进一步的简化代码库。这可以通过更快的迭代，对更强大，更快的模型进行有效的培训以及快速从研究转移到生产的新想法的灵活性，从而实现更广泛的实验。当运行可比较的型号时，Sockeye 3的速度比GPU上的其他Pytorch实现快126％，在CPU上的实现速度高达292％。Sockeye 3是根据Apache 2.0许可发布的开源软件。

translated by 谷歌翻译

State of the Art of Audio- and Video-Based Solutions for AAL

Slavisa Aleksic , Michael Atanasov , Jean Calleja Agius , Kenneth Camilleri , Anto Cartolovni , Pau Climent-Peerez , Sara Colantonio , Stefania Cristina , Vladimir Despotovic , Hazim Kemal Ekenel

分类：人工智能

2022-06-26

该报告说明了基于音频和视频数据的最成功的AAL应用程序和功能的艺术状态，即（i）生命式和自我监控，（ii）对生命体征的远程监控，（iii）情绪状态识别，（（iv）食物摄入量监测，活动和行为认识，（v）活动和个人帮助，（vi）手势识别，（vii）秋季检测和预防，（viii）移动性评估和脆弱的识别以及（IX）认知和运动康复。对于这些应用程序方案，该报告说明了科学进步，可用产品和研究项目的状态。开放的挑战也被突出显示。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

UniMorph 4.0: Universal Morphology

Khuyagbaatar Batsuren , Omer Goldman , Salam Khalifa , Nizar Habash , Witold Kieraś , Gábor Bella , Brian Leonard , Garrett Nicolai , Kyle Gorman , Yustinus Ghanggo Ate

分类：自然语言处理

2022-05-07

通用形态（UNIMORPH）项目是一项合作的努力，可为数百种世界语言实例化覆盖范围的标准化形态拐角。该项目包括两个主要的推力：一种无独立的特征架构，用于丰富的形态注释，并以各种语言意识到该模式的各种语言的带注释数据的类型级别资源。本文介绍了过去几年对几个方面的扩张和改进（自McCarthy等人（2020年）以来）。众多语言学家的合作努力增加了67种新语言，其中包括30种濒危语言。我们已经对提取管道进行了一些改进，以解决一些问题，例如缺少性别和马克龙信息。我们还修改了模式，使用了形态学现象所需的层次结构，例如多肢体协议和案例堆叠，同时添加了一些缺失的形态特征，以使模式更具包容性。鉴于上一个UniMorph版本，我们还通过16种语言的词素分割增强了数据库。最后，这个新版本通过通过代表来自metphynet的派生过程的实例丰富数据和注释模式来推动将衍生物形态纳入UniMorph中。

translated by 谷歌翻译

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Kaustubh D. Dhole , Varun Gangal , Sebastian Gehrmann , Aadesh Gupta , Zhenhao Li , Saad Mahamood , Abinaya Mahendiran , Simon Mille , Ashish Srivastava , Samson Tan

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-06

数据增强是自然语言处理（NLP）模型的鲁棒性评估的重要组成部分，以及增强他们培训的数据的多样性。在本文中，我们呈现NL-Cogmenter，这是一种新的参与式Python的自然语言增强框架，它支持创建两个转换（对数据的修改）和过滤器（根据特定功能的数据拆分）。我们描述了框架和初始的117个变换和23个过滤器，用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构，Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用（\ url {https://github.com/gem-benchmark/nl-augmenter}）。

translated by 谷歌翻译